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ABSTRACT 


Existing  approaches  to  visual  Simultaneous  Localization  and  Mapping  (SLAM) 
typically  utilize  points  as  visual  feature  primitives  to  represent  landmarks  in  the 
environment.  Since  these  techniques  mostly  use  image  points  from  a  standard 
feature  point  detector,  they  do  not  explicitly  map  objects  or  regions  of  interest. 
Lurther,  previous  SLAM  techniques  that  propose  the  use  of  higher  level  structures 
often  place  constraints  on  the  environment,  such  as  requiring  orthogonal  lines  and 
planes.  Our  work  is  motivated  by  the  need  for  different  SLAM  techniques  in  path 
and  riverine  settings,  where  feature  points  can  be  scarce  and  may  not  adequately 
represent  the  environment.  Accordingly,  the  proposed  approach  uses  Bezier  poly¬ 
nomial  curves  as  stereo  vision  primitives  and  offers  a  novel  SLAM  formulation  to 
update  the  curve  parameters  and  vehicle  pose.  This  method  eliminates  the  need 
for  point-based  stereo  matching,  with  an  optimization  procedure  to  directly  ex¬ 
tract  the  curve  information  in  the  world  frame  from  noisy  edge  measurements. 
Lurther,  the  proposed  algorithm  enables  navigation  with  fewer  feature  states  than 
most  point-based  techniques,  and  is  able  to  produce  a  map  which  only  provides 
detail  in  key  areas.  Results  in  simulation  and  with  vision  data  validate  that  the 
proposed  method  can  be  effective  in  estimating  the  6DOL  pose  of  the  stereo  cam¬ 
era,  and  can  produce  structured,  uncluttered  maps.  Monte  Carlo  simulations  of 
the  algorithm  are  also  provided  to  analyze  its  consistency. 
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CHAPTER  1 


INTRODUCTION 


A  necessary  capability  for  any  autonomous  vehicle  is  to  progressively  construct  a 
map  of  its  surroundings  whilst  localizing  itself  within  the  map,  a  real-time  process 
known  as  the  SLAM  problem  [1]  [2] .  GPS  has  long  been  used  to  localize  aerial 
vehicles  in  various  applications,  but  may  not  always  be  available  in  remote  areas, 
in  outdoor  environments  that  have  heavy  forest  canopies,  or  even  in  some  indoor 
environments.  There  exist  a  variety  of  alternative  sensors  that  can  be  used  to  deter¬ 
mine  the  range  and  bearing  to  landmarks  in  the  environment,  including  laser  range 
finders,  RADAR  /  LIDAR  systems,  ultrasonic  sensors,  and  stereo  and  monocular 
cameras.  Nevertheless,  favorable  size,  mass,  and  power  consumption  qualities  of 
lightweight  cameras  make  them  very  attractive  for  autonomous  navigation. 

A  great  deal  of  past  work  has  focused  on  visual  SLAM.  For  example,  the  orig¬ 
inal  MonoSLAM  algorithm  [3]  was  able  to  estimate  pose  of  a  monocular  camera 
and  map  a  room,  and  indeed,  our  previous  work  [4]  [5]  [6]  achieved  navigation 
and  mapping  using  a  single  camera  in  an  orthogonal  indoor  environment.  How¬ 
ever,  like  most  SLAM  techniques,  such  prior  works  utilize  points  as  landmarks, 
an  approach  with  a  number  of  drawbacks: 

1.  The  landmarks  represent  image  points,  and  may  have  no  physical  signifi¬ 
cance  in  the  environment 

2.  There  is  no  exploitation  of  higher  level  structure  in  the  environment 

3.  Methods  that  can  provide  rich  maps  [7]  require  a  much  larger  state  space, 
and  combinatorial  data  association  is  considerably  more  difficult. 

From  the  perspective  of  robot  motion  planning,  guidance,  and  control,  it  is  de¬ 
sirable  for  the  produced  map  to  have  some  structure  and  for  the  landmarks  to  rep¬ 
resent  meaningful  physical  objects.  Otherwise,  there  is  no  indication  of  whether 
landmarks  represent  obstacles,  traversable  regions,  or  points  of  interest  in  the  en¬ 
vironment. 
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Before  presenting  the  curve -based  SLAM  algorithm,  we  begin  by  examining 
related  work  in  Visual  SLAM. 


1 . 1  Related  Work 

Monocular  vision  is  a  difficult  problem,  in  part  because  the  projective  geometry 
means  that  depth  of  a  landmark  along  the  axis  of  the  camera  (i.e.,  distance  from  the 
camera)  cannot  be  estimated  from  a  single  measurement.  Early  research  solved 
this  problem  by  initializing  the  landmark  after  multiple  measurements  were  made 
[3],  while  more  recent  approaches  use  an  Inverse  Depth  Parametrization  (IDP)  to 
initialize  the  landmark  after  a  single  observation  [8].  Other  monocular  approaches 
utilize  the  ground  planar  constraint  of  different  environments  to  immediately  ini¬ 
tialize  landmarks  [4]  [6],  but  the  methods  still  require  a  height  measurement  from 
an  altimeter  sensor.  Another  technique  [9]  allows  the  MonoSLAM  method  to  be 
applied  to  larger  scale  environments,  by  starting  new  submaps  once  the  original 
map  grows  to  an  unmanageable  size,  and  then  stitching  the  local  maps  together 
into  a  global  map  using  the  Hierarchal  SLAM  algorithm  [10].  Their  results  show 
a  successful  loop  closure  in  a  dynamic  urban  environment,  producing  an  accurate 
global  map. 

Stereo  vision-based  methods  allow  immediate  initialization  of  landmarks,  but 
can  produce  erroneous  estimates  for  distant  landmarks  [1 1].  This  can  be  improved 
by  modelling  the  measurements  as  separate  monocular  measurements  rather  than 
a  single  stereo  frame  [12].  Visual  odometry  techniques  have  been  widely  ex¬ 
plored,  but  can  suffer  from  considerable  drift,  analogous  to  the  drift  of  standard 
vehicle  odometry.  Recent  research  looks  instead  to  combine  visual  odometry  with 
monocular  SLAM  [13],  using  the  “pose  prior”  obtained  from  visual  odometry 
to  strengthen  the  Visual  SLAM  result.  Structure  from  Motion  (SFM)  techniques 
such  as  Bundle  Adjustment  (BA),  can  produce  a  very  high  level  of  accuracy,  but 
often  at  greater  computation  expense.  However,  there  has  been  recent  research  al¬ 
lowing  for  a  real-time  implementation  using  a  sliding  window  bundle  adjustment 
[14],  and  further  work  using  the  random  sample  consensus  (RANSAC)  algorithm 
and  and  EKF  to  optimize  over  a  sliding  window  [15]. 

However,  the  algorithms  outlined  thus  far,  both  mono-based  and  stereo-based, 
use  feature  points  as  landmarks.  Consequently,  they  are  susceptible  to  the  draw¬ 
backs  outlined  earlier,  and  it  is  beneficial  to  consider  using  an  approach  that  rec- 
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ognizes  structure  in  the  environment. 


1.1.1  Higher  Level  Structure  in  Vision  and  SLAM 

A  number  of  related  works  have  attempted  to  discover  or  utilize  higher  level  struc¬ 
ture  in  SLAM.  Some  of  these  assume  an  orthogonal  environment  and  utilize  lines 
or  planes  to  obtain  the  SLAM  estimate  [16]  [17]  [18].  However,  the  work  in  [16] 
relies  on  dense  landmark  sets  (such  as  a  laser  scan  dataset)  and  only  landmarks 
that  are  collinear  or  coplanar  can  be  used  in  the  SLAM  pose  estimate,  as  is  the 
case  in  [17].  The  approach  outlined  in  [18]  allows  for  higher  structure  in  the  en¬ 
vironment  using  line  segments  that  can  be  initialized  immediately  (analogous  to 
IDP  for  points),  but  still  requires  a  larger  state  space  than  would  be  needed  for 
a  curve-based  approach.  Another  related  work  [19]  parametrizes  3D  lines  using 
Pliicker  coordinates,  suited  to  a  pinhole  camera  projection  model,  and  uses  these 
in  an  EKF-based  SLAM  framework,  while  the  work  of  [20]  utilizes  the  orthogo¬ 
nality  assumption  of  the  indoor  environment  to  map  lines  on  the  floor. 

Some  past  works  also  combine  object  recognition  with  visual  SLAM  [21]  [22], 
incorporating  objects  of  interest  into  the  map.  However,  the  work  in  [21]  utilizes 
an  IR  scanner  as  well,  while  [22]  uses  orthogonal  household  objects  rather  than 
a  more  freeform  primitive.  In  both  works,  the  estimation  algorithm  is  still  point- 
based,  and  the  object  recognition  algorithm  is  only  used  to  cluster  points  and 
improve  data  association. 

The  use  of  curve  structures  in  vision  has  also  been  considered  in  prior  research. 
A  number  of  early  works  propose  the  use  of  algebraic  curve  primitives  in  stereo 
vision  rather  than  points  [23]  [24],  deriving  closed  form  implicit  expressions  for 
an  algebraic  curve  in  Cartesian  space  given  its  projection  in  multiple  images. 
However,  while  algebraic  curves  can  more  diversely  describe  a  range  of  image 
shapes,  they  are  difficult  to  use  computationally  since  only  certain  polynomial  va¬ 
rieties  admit  a  parametric  representation.  Another  work  [25]  also  examines  the 
multi-view  relationships  for  non-algebraic  curves,  lines  and  conic  sections,  while 
a  more  recent  approach  utilizes  an  iterative  technique  with  a  Non-Uniform  Ra¬ 
tional  B-Spline  (NURBS)  model  to  perform  optimal  stereo  reconstruction  of  3D 
curves  [26].  A  similar  technique  reconstructs  smooth  space  curves  with  images 
of  objects  from  different  views  [27],  but  assumes  that  the  relative  motion  between 
cameras  is  known,  rendering  it  unsuitable  for  SLAM.  Another  SFM-style  algo- 
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rithm  performs  multi- view  reconstruction  of  a  set  of  curves,  and  impressive  results 
are  presented  with  a  few  different  image  sequences,  including  top-down  imagery 
from  an  aerial  vehicle  [28].  However,  as  a  batch  update  algorithm  which  requires 
the  full  sequence  of  images,  it  cannot  be  directly  applied  to  real-time  robot  naviga¬ 
tion.  More  significantly,  it  only  considers  curves  that  are  fully  observed  in  several 
image  frames;  for  SFM  this  may  be  reasonable,  but  in  SLAM,  partial  curves  are 
frequently  observed  and  need  to  be  incorporated  as  measurements.  One  approach 
[29]  uses  basis  curves  in  the  application  of  road  lane  detection,  but  only  considers 
the  scenario  of  a  ground  vehicle  travelling  on  a  road  with  marked  lanes,  whereas 
a  more  general  method  in  6DOF  is  desirable  for  our  application. 

The  most  relevant  prior  work  is  that  of  Pedraza  et  al.  [30] [31],  in  which  B- 
splines  are  applied  to  SLAM  by  using  the  control  points  of  the  curves  as  a  de¬ 
scription  of  the  environment.  They  propose  an  EKF-SLAM  based  framework  for 
navigation,  with  excellent  results  in  multiple  indoor  environments.  However,  the 
algorithm  is  applied  to  a  laser-based  SLAM  system,  and  the  observation  model 
utilizes  point  measurements  in  3D  position  co-ordinates.  Further  work  improves 
the  observation  covariance  of  the  algorithm  and  yields  more  consistent  results 
with  a  spline-based  observation  model  [32]. 

Our  proposed  algorithm  looks  to  build  on  their  work  in  a  number  of  ways. 
Firstly,  our  algorithm  generalizes  their  approach  to  6DOF,  allowing  for  application 
in  non-planar  vehicles.  Secondly,  our  application  to  Visual  SLAM  means  that  the 
algorithm  could  operate  in  a  range  of  environments  where  laser  ranging  finding 
returns  may  not  be  available,  such  as  in  outdoor  terrain.  In  doing  so,  we  also 
provide  a  novel  measurement  parametrization  that  avoids  feature  points  altogether 
in  visual  SLAM:  the  optimal  curve  parameters  are  found  from  a  stereo  image  pair, 
and  these  curve  parameters  are  used  as  the  measurements  in  SLAM.  Lastly,  while 
Pedraza  et  al.  utilize  odometry  in  their  approach  (a  reasonable  assumption  for  a 
ground  vehicle),  our  proposed  technique  utilizes  vision  as  the  only  sensory  input, 
thereby  ensuring  it  can  be  used  onboard  any  robotic  platform,  or  indeed,  a  simple 
stereo  camera  rig. 

1.1.2  Summary  and  Contributions 

A  large  amount  of  related  work  exists  in  visual  navigation  and  SLAM,  but  in  gen¬ 
eral,  most  approaches  utilize  point  features  without  regard  to  the  structure  of  the 
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environment.  Clearly,  there  are  drawbacks  to  utilizing  point-based  methods  in  vi¬ 
sual  SLAM,  and  these  characteristics  limit  the  efficacy  of  a  SLAM  algorithm.  By 
parametrizing  landmarks  in  a  way  that  explicitly  considers  higher  level  structures, 
it  will  be  possible  to  produce  structured  maps  with  more  meaningful  landmarks, 
which  could  provide  more  useful  information  to  the  vehicle  to  aid  path  planning 
and  control. 

A  number  of  past  works  have  looked  into  utilizing  higher  level  structure  for 
SLAM,  but  most  of  these  place  orthogonality  constraints  on  the  environment  or 
still  utilize  points,  clustering  them  for  object  classification.  A  few  works  have 
looked  into  curve  structures  in  vision,  but  mostly  do  not  consider  their  applica¬ 
tion  to  SLAM.  The  work  of  [31]  [32]  develops  a  spline-based  SLAM  framework, 
but  this  is  only  for  application  to  LIDAR-based  SLAM  for  a  planar  vehicle,  and 
vehicle  odometry  is  also  used. 

The  work  presented  in  this  thesis  uses  a  Bezier  curve -based  landmark  parametriza- 
tion  in  a  stereo  vision-based  SLAM  framework.  The  proposed  algorithm  aims 
to  achieve  accurate  localization  and  structured  mapping  by  using  cubic  Bezier 
polynomial  curves  [33]  as  stereo  vision  primitives  instead  of  feature  points.  This 
results  in  detailed  continuous  curves  mapped  through  considerably  fewer  land¬ 
mark  states.  Further,  instead  of  explicitly  matching  points  between  left  and  right 
images,  stereo  matching  is  accomplished  implicitly  through  a  nonlinear  function¬ 
fitting  process. 

To  our  knowledge,  there  exists  no  visual  SLAM  algorithm  utilizing  curve  prim¬ 
itives.  Our  algorithm  can  perform  localization  and  compactly  represent  the  map  of 
the  environment  as  a  set  of  curves.  However,  in  contrast  with  [31]  [29],  our  algo¬ 
rithm  provides  6DOF  estimation  and  can  produce  results  in  environments  where 
laser-based  techniques  would  fail.  In  particular,  the  novel  contributions  of  the 
present  thesis  can  be  stated  as  follows: 

•  We  propose  a  novel  method  to  extract  curve  parameters,  such  as  control 
points,  from  a  stereo  image  pair,  without  conventional  point-based  stereo 
matching.  This  approach  allows  us  to  obtain  an  absolute  measurement  of 
roll,  pitch,  and  height  on  each  stereo  frame. 

•  We  propose  a  novel  SLAM  formulation  utilizing  curve  structures  as  land¬ 
marks.  By  using  curve  structures,  we  are  able  to  estimate  the  full  6DOF 
pose  with  much  fewer  landmarks  than  typical  feature  point-based  methods, 
and  can  still  produce  a  structured  map,  without  odometry  or  use  of  any  sen- 
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sor  other  than  a  camera  rig.  This  is  the  first  such  method  in  existence  in 
visual  SLAM. 

The  remainder  of  this  thesis  is  organized  as  follows.  Chapter  2  provides  a  brief 
overview  of  the  CurveSLAM  algorithm;  Chapter  3  details  the  parametrization  of 
cubic  Bezier  curves  and  epipolar  geometry  of  curves;  Chapter  4  outlines  the  pro¬ 
cedure  for  extracting  curve  parameters  from  stereo  images;  Chapter  5  derives  the 
EKF-based  SLAM  formulation  and  data  association  method;  Chapter  6  presents 
the  simulation  and  experimental  results;  and  Chapter  7  contains  the  conclusion 
and  suggestions  for  ongoing  and  future  work. 
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CHAPTER  2 


CURVESLAM  ALGORITHM  OVERVIEW 


The  proposed  approach  utilizes  planar  cubic  Bezier  curves  to  represent  each  curve 
in  the  world  frame.  Each  observed  curve  is  projected  to  the  left  and  right  images 
as  a  series  of  detected  edge  points. 

Instead  of  matching  feature  points  between  the  left  and  right  images,  we  run  a 
nonlinear  curve  fitting  algorithm  to  directly  determine  the  Bezier  curve  parameters 
(refer  to  Chapter  3)  in  the  world  frame  that  best  correspond  to  the  projected  curve 
point  pixels.  In  this  way,  the  need  for  stereo  matching  is  eradicated,  and  the 
environment  can  be  represented  simply  by  a  set  of  curves. 

We  also  exploit  the  fact  that  the  curves  of  interest  are  constrained  to  the  ground 
plane.  By  imposing  this  planar  constraint,  it  is  possible  to  not  only  extract  the 
curve  parameters,  but  also  a  measurement  of  the  height  (z)  from  the  camera  to 
the  ground  plane,  and  the  absolute  orientation  (pitch  and  roll)  of  the  camera  with 
respect  to  the  ground  plane,  as  will  be  explained  in  more  detail  in  Chapter  4. 

This  assumption  is  not  unrealistic;  most  environments  have  a  ground  that  is 
close  to  planar,  and  results  from  this  proposed  algorithm  as  well  as  our  previous 
work  [6]  demonstrate  that  variations  in  this  planarity  have  a  minimal  impact  on 
the  effectiveness  of  the  algorithm. 

The  process  can  be  split  into  Curve  Fitting  (Chapter  4)  and  SLAM  (Chapter 
5)  and  is  shown  in  Figure  2.1.  The  steps  are  summarized  in  pseudocode  form  in 
Algorithm  1,  and  are  as  follows: 

1.  We  begin  with  a  time-synchronized  stereo  image  frame,  perform  edge  de¬ 
tection  to  extract  the  curve  edge  points,  and  group  the  edge  points  into  dif¬ 
ferent  curves. 

2.  The  grouped  edge  points  are  then  passed  into  a  nonlinear  Levenberg-Marquardt 
model  fitting  algorithm,  which  outputs  the  curve  control  points  as  well  as 
pitch,  roll,  and  height.  The  algorithm  determines  the  set  of  parameters  that 
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Figure  2.1:  Flow  diagram  of  the  proposed  CurveSLAM  algorithm 

minimize  the  pixel  reprojection  error  when  compared  to  the  edge  point  mea¬ 
surements. 

3.  The  curve  parameters  are  considered  as  measurements  for  the  SLAM  proce¬ 
dure.  First,  data  association  is  performed,  determining  the  correspondence 
between  the  measured  curves  and  the  existing  map  curves. 

4.  Next,  a  curve  splitting  algorithm  is  applied  to  match  the  measured  curves 
with  segments  of  existing  map  curves. 

5.  The  map  is  updated  and  the  remaining  pose  variables  (yaw,  x,  and  y)  are 
determined  by  using  an  EKF-based  SLAM  formulation. 

This  process  constitutes  one  iterative  loop  of  the  proposed  novel  CurveSLAM 
algorithm,  and  is  repeated  on  every  successive  stereo  image  frame.  From  each  of 
these  iterations,  we  can  maintain  the  full  6DOF  pose  of  the  vehicle  and  the  map 
curves  representing  the  environment. 
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Algorithm  1  CurveSLAM 
1:  while  New  stereo  image  frame  available  do 

2:  Run  edge  detection  and  group  edge  points  into  M  different  curves 

3:  y  :=  grouped  edge  points 

4:  pi  :=  control  points  of  ith  measured  curve 

5:  {z4  (p,  0}  :=  {height,  roll,  pitch} 

6:  (3  :=  [pf,  Pm,  z ,  (p ,  9] 

7:  while  ||<5||  >  e  do 

8:  predict  edge  points  y  from  parameters  (3  with  y  =  f(/3) 

9:  J  :=  ™ 

10:  <5  :=  (JTJ  +  A  diag(JTJ))  1  Jr  (y  -  f  ((3)) 

11:  (3^f3  +  8 

12:  end  while 

13:  Perform  EKF  prediction  for  robot  pose  xr 

14:  Find  correspondences  between  M  measured  curves  pi5  and  N  existing 

map  curves  xJ5  with  i  e  {0, 1, M}  and  j  e  {0, 1, N} 

15:  Split  curves  into  segments  for  direct  correspondence 

16:  Add  new  states  and  apply  EKF  update  to  state  vector  x  = 

[xr,  Xi,  Xjv]T 

17:  end  while 
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CHAPTER  3 


CURVE  PARAMETRIZATION  AND 
GEOMETRY 


This  chapter  outlines  the  epipolar  geometry  of  curves  and  introduces  the  Bezier 
curve  parametrization  used. 

For  this  research,  we  utilize  a  calibrated  stereo  camera  rig,  and  assume  that 
both  cameras  have  the  same  orientation  (i.e.,  zero  relative  rotation),  separated  by 
a  fixed  baseline  d. 


3.1  Epipolar  Geometry  of  Curves 

For  a  calibrated  stereo  camera  pair,  a  matched  stereo  measurement  of  a  single 
point  is  enough  to  fully  define  the  3D  point  in  the  body  frame  of  the  camera.  Three 
stereo  matched  points  are  enough  to  fully  define  a  plane,  with  the  assumption  that 
the  points  are  not  collinear. 

Similarly,  a  single  planar  curve  measured  in  both  left  and  right  images  is  fully 
defined  with  respect  to  the  body  frame  of  the  stereo  camera  rig.  Consider  the  ex¬ 
ample  of  a  planar  polynomial  curve  C  in  3D  space,  which  projects  to  two  different 
curves  CL  and  CR  in  each  of  the  stereo  images  (see  Figure  3.1).  In  the  inverse 
projective  mapping,  a  curve  in  a  single  image  represents  a  ruled  surface  projected 
out  from  the  optical  center  O  of  the  camera.  If  we  have  a  second  image  of  the 
same  curve,  the  curve  in  3D  space  is  the  intersection  of  the  ruled  surfaces  pro¬ 
jected  from  both  images  (Figure  3.1).  Therefore,  if  the  curve  is  observed  in  both 
images,  we  can  uniquely  determine  the  parameters  of  the  curve,  as  well  as  the  per¬ 
pendicular  distance  to  the  plane  containing  the  curve  and  the  relative  orientation 
between  the  camera  and  the  plane.  Since  the  curves  considered  in  this  thesis  are 
all  planar  (on  the  ground  plane),  each  frame  gives  us  a  measurement  of  height, 
pitch  and  roll,  in  addition  to  the  control  parameters  defining  the  curve.  Naturally, 
stereo  vision  is  especially  susceptible  to  error  in  the  case  of  distant  objects,  so  care 
must  be  taken  to  avoid  curve  features  at  a  large  distance. 
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Figure  3.1:  Planar  Polynomial  Curve  projected  to  Stereo  Images 


We  can  prove,  under  certain  assumptions,  that  the  preimage  of  two  image  curves 
is  itself  a  curve  in  world  co-ordinates.  This  proof  is  outlined  below. 


3.1.1  Proof  of  Curve  Reconstruction 

For  this  proof,  we  make  the  assumption  that  for  a  specific  curve  in  Ji3,  the  map 
/  :  3?3  — »  3ft2  projecting  the  curve  to  an  image  is  an  isomorphism.  Practically,  this 
assumption  is  not  restrictive;  it  only  implies  that  the  curve  is  “fully  observed”  in 
both  images  (i.e.,  a  curve  should  not  lose  information  and  appear  as  a  point  or  line 
when  projected  to  the  image).  In  any  case,  such  an  assumption  is  necessary  for 
visual  curve-based  SLAM. 

Background: 

For  a  smooth  map  between  manifolds  given  by  /  :  X  — >  Y,  y  E  Y  is  a  regular 
value  of  /  if  Vx  6  /_1  (y),  dfx  :  Tx X  — y  TyY  is  surjective.  Here,  Tx X  and  TyY 
are  the  tangent  spaces  of  X  and  Y  at  points  x  and  y. 

The  Preimage  Theorem:  If  /  :  X  — >  Y  is  a  smooth  map,  and  y  &  Y  is  a  reg¬ 
ular  value  of  /,  then  M  =  {x  :  x  E  /_1(l/)}  is  a  submanifold  of  A",  and  the 
codimension  of  M  in  X  is  equal  to  the  dimension  of  Y. 

Proposition:  Given  a  stereo  image  frame,  and  a  curve  observed  in  each  image, 
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the  preimage  is  itself  a  curve  in  the  world  frame. 

Proof: 

We  can  define  a  curve  in  the  left  image  as  fi(ui,vi )  =  0.  Under  normalized 
perspective  projection,  ui  =  xj z  and  vj  —  y/z.  So,  we  can  express  this  curve  as 

fi(x/z,y/z )  =  0,  fi  :  5ft3  ->■  SR1. 


Then,  the  inverse  image  of  0  is  given  by: 

Mi  =  {( x,y,z )  G  9f?3  |  fi(x/z,yfz)  =  0} 

and  using  the  Preimage  Theorem,  Mi  is  a  manifold  in  3ft3  with  codimension  1. 
Thus,  Mi  is  a  2-manifold,  which  can  be  represented  in  implicit  form  by  the  set: 

Mi  =  {(x,  y,  z)  G  3ft3  |  Fi(x,  y,  z)  =  0} 


Similarly,  if  we  define  the  curve  in  the  right  image  as  fr(ur ,  vr)  =  0,  using  a 
similar  argument  the  inverse  image  of  0  in  the  right  image  is  also  a  2-manifold 
given  by: 

Mr  =  {(x,  y,  z)  G  3tf3  I  Fr(x,  y,  z)  =  0} 

Consider  now  the  function  F  :  3ft3  — *  3?2  given  by 


F(x,y,z) 


Fi{x,y,z) 

Fr(x,y,z) 


The  inverse  image  M  of  the  stereo  image  curves  is  the  intersection  of  the  two 
surfaces  Mi  and  Mr ,  or  the  set  of  points  for  which  Ft  —  Fr  —  0: 


M  =  {(x,  y,  z)  G  3f?3  |  F(x,  y,  z)  =  0] 


Since  in  this  case,  F  :  3ft3  — »  3?2,  we  can  conclude  using  the  Preimage  Theorem 
that  the  inverse  image  of  the  point  [0,  0]T  will  be  a  manifold  of  codimension  2  in 
3?3  (i.e.,  a  1 -manifold,  or  a  curve). 
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3.2  Bezier  Curve  Characteristics 


In  the  proposed  algorithm,  planar  cubic  Bezier  curves  are  used  to  represent  the 
ground  planar  curves.  However,  the  algorithm  itself  is  scalable  to  Bezier  curves 
of  any  order. 

The  following  properties  of  Bezier  curves  [33]  make  them  favorable  for  this 
application: 

1.  A  cubic  Bezier  curve  can  be  represented  by  4  control  points  or  in  parametric 
form  with  the  position  bx(t)  and  by(t)  on  the  ground  plane  defined  on  the 
interval  t  e  [0,1].  The  control  points  define  the  shape  of  the  curve,  which 
loosely  follows  the  path  between  them,  as  shown  in  Figure  3.2. 

2.  Any  affine  transformation  on  a  Bezier  curve  is  equivalent  to  an  affine  trans¬ 
formation  on  the  control  points.  Perspective  projection  is  not  affine,  but 
for  small  distances  from  the  camera,  it  can  be  approximated  as  such;  this 
means  that  projecting  a  world  curve  to  the  left  or  right  images  is  approxi¬ 
mately  equivalent  to  mapping  the  control  points  to  the  image  planes. 

3.  The  start  and  end  points  of  the  curve  are  well-defined;  that  is,  unlike  an 
implicit  polynomial  representation,  the  curve  is  only  defined  on  an  interval 
of  the  parameter  t.  This  is  a  useful  attribute,  since  only  finite  segments  of  a 
real  world  curve  are  observed. 

4.  The  control  points  of  any  partial  segment  of  the  curve  can  be  recovered  by 
applying  a  linear  transformation  to  the  control  points  of  the  full  curve,  as 
will  be  shown  in  Section  5.1.1. 

In  fact,  many  of  these  properties  are  general  to  all  Non-Uniform  Rational  B- 
Spline  (NURBS)  curves,  but  for  the  curve  fitting  method,  specifying  the  parametriza- 
tion  is  essential.  A  specific  Bezier  curve  parametrization  also  means  that  the  curve 
is  fully  specified  with  the  control  points  (i.e.,  without  a  knot  vector  or  weights)  and 
the  transformation  between  control  points  and  polynomial  coefficients  (derived  in 
the  next  section)  is  straightforward. 


3.2.1  N otation  and  Definitions 

In  the  general  case  of  M  existing  Bezier  curves,  we  will  adopt  the  notation  p), 
with  £  e  {1,  2,  ...,  M}  and  i  e  {0,  1,  2,  3}  to  refer  to  the  ith  control  point  of 
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Figure  3.2:  A  sample  cubic  Bezier  curve  and  its  associated  control  points 


the  tth  curve.  We  also  use  the  absence  of  superscript  (z)  to  represent  the  4  control 
points  of  the  £th  curve  with  the  control  point  vector. 


Pr 


(1)  (2)  (3)  (0)  (1) 

Vx  1  Cx  1  Cx  >  Cy  1  fy 


T 

t 


(3.1) 


Then,  a  planar  Bezier  spline  function  b(f )  which  is  cubic  (n  =  3)  with  a  control 
point  vector  can  be  expressed  as: 


b  (t) 


bx(t ) 

by(t) 


te[ 0.1] 

i= 0  ^  ' 


(3.2) 


By  varying  t  between  0  and  1,  any  point  on  the  curve  can  easily  be  generated 
or  rendered. 

Further,  the  curve  control  points  can  be  transformed  to  a  set  of  parametric  poly¬ 
nomial  coefficients  to  express  the  planar  Bezier  curve  in  the  form: 


b(t) 


bx{t) 

OlQ  +  ait  +  02 12  +  03 13 

by(t) 

04  +  05 1  +  «6 12  +  ayt3 

t  e  [0,  l] 


(3.3) 


Rearranging  the  Bezier  expressions,  we  can  find  the  direct  relationship  for  co¬ 
efficients  o3  as: 
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ni 


if  i  <  4 


ai  =  < 


n\  -y^  (-1  )j+lp{x 


n\  sy  (-1  )j+iPy 

(' n-j)l ^ 


(*) 


if  i  >  4 


(3.4) 


Expressing  the  coefficients  as  a  vector  cte  =  [a0,  a4,  a2,  a3,  a4,  a5,  a6,  a7\eT, 

the  transformation  between  cue  and  p/  is  linear,  with  Jacobian  expressed  using  ma¬ 
trix  A  G  3?4x4: 


{  n\  (-1)<+J 
(n-i)!  j\(i-  j)\ 


if  j  <  i 
if  j  >  i 


(3.5) 


do:i>  A 

dpf  0 

The  matrix  A  is  repeated  because  the  same  transformation  has  to  be  applied 
separately  to  both  the  x  and  y  coordinates  of  the  control  points  to  obtain  the  in¬ 
dependent  parametric  polynomial  coefficients  in  x  and  y.  This  form  will  arise 
frequently  throughout  the  remainder  of  this  thesis. 


0 

A 


(3.6) 


3.3  Camera  Projection  Model 

For  a  Bezier  curve  being  viewed  by  two  cameras,  we  can  easily  approximate  the 
equivalent  projected  curves  in  the  left  and  right  images  by  applying  the  camera 
projective  transformation  to  the  Bezier  control  points. 

First,  we  define  the  frames  of  reference  as  follows  (see  Figure  3.3).  The  Body 
frame  f/,  is  used  to  denote  the  frame  fixed  at  the  center  of  the  stereo  head,  with 
the  x,  y,  z  axes  following  a  Forward-Right-Down  convention.  The  Focal  Ground 
frame  fs  is  the  projection  of  the  Body  frame  onto  the  ground  plane.  Finally,  the 
Earth  frame  fe  is  fixed  as  the  inertial  frame. 

In  other  words,  0,  6  and  z  (the  “out-of-plane  degrees  of  freedom”)  transform 
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Figure  3.3:  Graphical  depiction  of  stereo  rig  and  ground  plane,  with  frames  of 
reference  shown 

between  the  Body  and  Local  ground  frames,  while  x,  y  and  0  (the  “planar  degrees 
of  freedom”)  transform  between  the  Local  Ground  and  Earth  frames. 

This  selection  of  reference  frames  is  natural  for  this  application,  because,  as 
mentioned  earlier,  each  stereo  image  frame  only  provides  the  out-of-plane  degrees 
of  freedom;  the  in-plane  motion  cannot  be  determined  from  a  single  frame. 

In  the  defined  frames,  a  point  on  the  ground  plane  with  Local  Ground  frame 
co-ordinates  x,;  =  [xg,  yg,  0]r  can  then  be  mapped  to  a  point  in  both  the  left  and 
right  image  (ui,  Vi)  and  (//,..  vr)  using  the  simple  transformation: 


xb 

Mb 

=  r  gb(4>,oy 

-\xg-tgb(z)) 

(3.7) 

Zb_ 

fu 

Zb  ~\~  Cy 

xb 

fu 

Ur  —  - Zb  +  Cy 

Xb 

(3.8) 

—  {Vb  +  x)  +  Cx 

Xb  2 

fv  i  d. 

Vr  =  - (2/6  -  Z)  +  Cx 

Xb  2 

where  d  is  the  baseline,  fx  and  fy  are  the  focal  lengths  in  the  x  and  y  directions  (in 
pixels),  and  cx  and  cy  are  the  image  x  and  y  coordinates  of  the  principal  point  (in 
pixels).  R gb((j),  6)  and  t gb(z)  represent  the  rotation  matrix  and  translation  vector 
to  transform  a  vector  between  the  Body  and  Local  Ground  frames: 
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—  sin  6 


Ryb((p,  6 ) 

t  gb{z) 


cos  9  0 

sin  6  sin  0  cos  0  cos  6  sin  0 
sin  6  cos  0  —  sin  0  cos  6  cos  0 

O' 

0 

z 


(3.9) 


Using  the  second  property  in  Section  3.2,  a  planar  Bezier  curve  in  the  ground 
frame  maps  to  another  Bezier  curve  in  both  the  left  and  right  images,  and  the 
control  points  of  the  projected  curves  in  the  left  and  right  images  can  be  found 
by  applying  (3.7)  and  (3.8)  to  the  control  points  of  the  curve  in  the  Local  Ground 
frame. 
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CHAPTER  4 


CURVE  FITTING 


This  chapter  outlines  the  process  of  extracting  the  curve  control  points  and  out- 
of-plane  pose  parameters  from  each  stereo  image  pair. 


4. 1  Edge  Detection  and  Grouping 

The  first  step  for  each  stereo  frame  is  to  obtain  edge  points  in  both  images  corre¬ 
sponding  to  the  curves  on  the  ground  plane.  Then,  once  an  edge  detector  has  been 
applied,  it  is  necessary  to  minimize  the  amount  of  noise  pixels  (corresponding  to 
irrelevant  edges)  and  group  edge  points  into  different  curves. 

For  the  purposes  of  this  thesis,  we  are  interested  in  mapping  path  environments, 
with  map  curves  corresponding  to  the  left  and  right  edges  of  the  path.  Accord¬ 
ingly,  we  extract  the  edge  points  as  follows.  First,  a  Gaussian  smoothing  filter 
[34]  is  applied  to  the  image  to  minimize  noise  in  edge  detection.  Then,  we  ex¬ 
ploit  the  lack  of  color  saturation  of  the  path  itself,  switching  to  HSV  space  and 
extracting  the  saturation  image.  Finally,  we  apply  the  Canny  edge  detector  [35] 
on  the  saturation  image,  with  a  very  high  hysteresis  /  stitching  threshold  and  a 
very  low  detection  threshold,  such  that  only  a  few  significant  edges  are  detected, 
but  contain  a  large  number  of  pixel  points.  With  the  edge  points  corresponding  to 
either  the  left  or  right  edges  of  the  path,  grouping  is  straightforward.  Further,  the 
edge  points  are  pruned  as  necessary  to  ensure  that  the  start  and  end  points  of  each 
curve  are  approximately  equivalent  in  the  left  and  right  images. 

By  applying  this  procedure,  we  obtain  a  series  of  edge  points,  grouped  into 
different  curves,  in  both  the  left  and  right  images.  Typical  results  of  this  method 
are  shown  in  Figure  4.1. 

It  is  important  to  note  that  the  edge  grouping  requirements  may  vary  depending 
on  the  robot’s  environment  and  the  quantity  of  curve  structures  that  need  to  be 
mapped.  For  instance,  in  the  case  of  this  thesis,  we  are  most  interested  in  map- 
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Figure  4.1:  Result  of  Canny  detection  and  edge  grouping  to  extract  path  edge 
curves 

ping  path  and  riverine  environments,  where  the  edge  points  only  correspond  to 
a  few  different  curves  (the  edges  of  the  river  or  path).  In  contrast,  in  an  indoor 
environment  cluttered  with  objects,  it  may  be  necessary  to  group  edge  points  into 
a  large  number  of  different  structures. 

Thus,  depending  on  the  requirement,  this  particular  step  of  the  algorithm  may 
be  altered,  to  use  in  conjunction  with  the  proposed  curve  fitting  and  SLAM  algo¬ 
rithms  outlined  in  this  chapter  and  the  following  chapters. 


4.2  Nonlinear  model  fitting 

Once  these  grouped  edge  points  are  obtained,  the  Levenberg-Marquardt  algorithm 
[36]  is  applied  to  fit  the  camera  projection  model  to  these  edge  point  measure¬ 
ments.  With  the  model  described  by  (3.2)  -  (3.8),  the  image  measurements  are 
purely  a  function  of  the  parameters  we  attempt  to  estimate  (the  planar  curve  con¬ 
trol  points  and  the  relative  orientation  between  the  stereo  camera  pair  and  the 
ground  plane).  More  specifically,  we  have  measurement  vector  y  comprising  n 
curve  points  in  the  left  image  and  m  curve  points  in  the  right  image  (grouped  into 
M  different  curves),  and  the  parameter  vector  (3: 


y  (Wl  1  1  •••)  ^Inl  ^ln 


P  = 


P  M  j  j  ^  r  ■ 


(4.1) 


Here,  pi,  ...,  p m  represent  the  control  point  vectors  of  each  of  the  M  Bezier 
curves  observed  in  the  image,  with  co-ordinates  defined  in  the  Local  Ground 
frame.  The  parameter  variable  zr  is  the  height  of  the  robot  above  the  ground 
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plane,  and  9r  and  0r  are  the  pitch  and  roll,  defined  as  Euler  angles.  Then,  we 
attempt  to  fit  the  model  y  =  f  {(3)  described  by  (3.2)  -  (3.8). 

On  each  iteration,  the  planar  orientation  and  the  current  curve  parameters  are 
used  to  determine  the  projected  Bezier  curves  in  the  left  and  right  images,  using 
(3.8).  The  reprojection  error  for  each  measured  edge  pixel  is  computed  as  the 
distance  between  the  pixel  and  the  nearest  point  on  the  projected  Bezier  curve. 
The  nearest  point  can  be  determined  algebraically  by  setting  the  derivative  of  the 
distance  to  zero  (see  chapter  4.3.1).  Then,  each  iteration  k  aims  to  minimize  the 
total  reprojection  error,  adding  update  8k  to  the  parameter  vector  / 3k : 


Sk  =  (JTJ  +  A  diag(JTJ))_1  JT  (y  -  f(/3fc)) 

(3k+ 1  —  (3  k  +  8k  (4.2) 

The  Jacobian  J  =  is  the  matrix  of  first-order  partial  derivatives  of  the 
measurements  with  respect  to  the  parameters,  and  will  be  derived  in  the  next  sec¬ 
tion.  The  parameter  A  is  a  damping  parameter  that  was  tuned  to  optimize  the 
convergence  characteristics  of  the  Levenberg-Marquardt  algorithm. 


4.3  Jacobian  Derivation 

Given  the  measurement  vector,  y  =  [uh,  vh,  ... uln ,  vtn,  uri,  vri,  vrJT, 

and  the  parameter  vector  (3  =  [piT,  ....  p mt,  z,  9 ,  0]T,  with  model  y  =  f  (/3), 
the  Jacobian  to  be  determined  is  J  =  4^. 

Every  measured  point  corresponds  to  a  point  in  the  Local  Ground  frame,  which 
corresponds  to  a  point  on  a  curve.  Suppose  a  measured  point  corresponds  to  curve 
7,  with  control  point  vector  p7  and  coefficient  vector  ck7.  Then,  recalling  the 
stereo  camera  projection  equations  defined  in  (3.8)  and  the  parametrization  for 
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cubic  Bezier  curves,  we  have  the  following  relations: 


x 


a 


Xfe 

ui 


Vl 


i=0 


EL  )(i-t)°-Y 


Pix 

Piy 

0 


R Vb(<t>,0)  1(^g-tgb(z)) 

fu  ^ 

—  Zb  +  Cy 
Xb 

—  ( Ub  +  x)  +  cx 

xb  2 


,  *e[o,i] 


_  fu  , 

Ur  -  - Zb  +  Cy 

Xb 

Vr  =  —{Vb  ~  7))  +  Cx 
Xb  2 


(4.3) 


Here,  xg  is  a  point  which,  from  the  edge  grouping  stage,  is  known  to  lie  on  curve 
7.  For  explanation  of  these  terms,  refer  to  (3.8). 

For  any  curve  k,  we  can  easily  transform  the  control  point  vector  pfc  to  para¬ 
metric  polynomial  form  with  coefficient  vector  a^,  and  know  the  Jacobian 
(Section  3.2).  Then,  the  Jacobian  of  x,;  with  respect  to  each  of  the  control  point 
vectors  pk  can  be  calculated  as  follows: 


dxs 

dpk 


d^Cg 

dxxk 


chtg  dock 
dcxk  dpk 

1  t  t2  t3  0  0  0  0 

0  0  0  0  1  t  t2  t3 

< 

000  0000  0 

0 


(4.4) 


if  k  —  7 


(4.5) 


otherwise 


In  other  words,  the  rows  of  the  Jacobian  corresponding  to  a  measured  point  are 
only  nonzero  for  the  corresponding  curve.  The  value  of  t  used  for  each  measure¬ 
ment  will  be  explained  later  in  this  section. 
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Next,  the  Jacobians  of  x&  with  respect  to  the  parameters  (3  are  as  follows: 


dx.b 

dpk 

dx.b 

~d0 


dx.b 

d(j) 

dxb 

dz 

dxb 

d/3 


dpk 

—  sin  9  0—  cos  6 

cos  9  sin  0  0  —  sin  9  sin  0 
cos  9  cos  0  0  —  sin  9  cos  0 


tgb(z) 


(4.6) 

(4.7) 


0  0 
sin  9  sin  0  —  sin  0 

—  sin  9  sin  0  —  cos  0 


0 

—  COS  9  COS  0 

—  cos  9  sin  0 


Xg  -  tgb(z) 


(4.8) 


-R  gb^.9)-1 


0 

0 

1 


(4.9) 


dx.fr  (foe*,  dXfc  dXf,  dx&  dXfc 

dpi  <ip2  *  ’  ’  c/pq  d6  d(f)  dz 


(4.10) 


The  full  Jacobian  J  =  dt.<)p}  can  then  be  computed  by  using  the  stereo  projection 
equations  (3.8).  The  rows  of  the  Jacobian  vary  depending  on  whether  the  edge 
point  corresponds  to  the  left  or  right  image.  For  a  left  image  edge  point  u/  = 
[ui,  Vi}1  and  right  image  edge  point  ur  =  [ur,  vr}T,  the  corresponding  rows  of  the 
Jacobian  can  be  found  with  the  following  relations: 


fu^b 

0 

fu~ 

dui 

dui  dxb 

dui 

‘2 

Xb 

Xb 

d/3 

dxb  d/3 

dxb 

fv{ljb  +  |) 

fv 

0 

Xb 

xb 

fu^b 

0 

fu 

dur 

dur  dxb 

dur 

,  b  ^ 

Xb 

dp 

dxb  d/3  ’ 

dxb 

fvidJb  ~  |) 

fv 

0 

rf.2 

L  xb 

Xb 

(4.11) 


(4.12) 


One  important  thing  to  note  is  that  this  Jacobian  still  depends  on  the  unknown 
parameter  t  through  the  term  ^0.  To  determine  this  t- value  for  each  row  of  the 
Jacobian  (i.e.,  for  each  measurement),  we  need  to  determine  the  point  on  the  pro¬ 
jected  image  curve  that  is  closest  to  the  measured  point.  In  other  words,  we  need 
to  use  the  parameter  estimates  to  project  the  world  curve  to  the  left  or  right  image 
(depending  on  the  location  of  the  measured  point),  and  the  t-value  of  the  measured 
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point  is  the  same  as  that  of  the  closest  curve  point. 

The  process  of  finding  the  nearest  curve  point  to  an  external  (measured)  point 
is  detailed  in  the  next  section. 


4.3.1  Nearest  point  Calculation  for  Cubic  Bezier  Curves 

Assume  (all  in  image  co-ordinates)  that  the  measured  edge  point  is  given  by 
(um,  vrn),  the  closest  curve  point  occurs  at  t  —  t*,  and  the  projected  image  curve 
has  control  points  (w0,i?0)  ...  (u3,v3): 


uc(t)  =  Y)?  n  (n)(l  —  t)n  ltlu.i 

c\  >  z^=o  \x)\  >  *  te  0  p 

The  distance  between  the  measured  point  and  the  point  on  a  curve  is: 


(4.13) 


s  =  \f ( uc(t )  -  umf  +  (vc(t)  -  vm )2  (4.14) 

To  find  the  nearest  point,  we  set  the  derivative  of  the  distance  with  respect  to  t  to 
zero.  Taking  the  derivative  and  simplifying  the  expression,  we  have  the  following 
polynomial  equation  in  t*\ 


5 

*= o 

Co  =  u2(u3  -  Um)  +  V2(v3  -  Vm) 

Cl  =  U2  +  2m!(m3  -  Um)  +  V22  +  2vi (v3  -  vm) 
c2  =  3(m2Mi  +  u0(u3  -Um)+  V2Vi  +  v0(v3  -vm)) 
c3  =  2{2u2Uq  +  U\~  +  2v2v0  +  ni 2 ) 
c4  =  5(ui  u0  +  Viv0) 

C5  =  3  (w02  +  Co2)  (4.15) 

This  can  be  solved  using  a  numerical  root  finding  technique  such  as  Newton’s 
method.  However,  this  process  needs  to  be  performed  for  every  measured  edge 
point  on  every  iteration  of  the  curve  fitting  algorithm.  As  a  result,  computational 
expense  is  a  necessary  consideration.  To  minimize  the  computational  cost,  we  ap¬ 
proximate  this  process  as  follows.  Given  a  sequence  of  edge  point  measurements 
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corresponding  to  a  measured  curve,  we  can  “assign”  a  t-value  to  each  measure¬ 
ment  based  on  its  position  in  the  sequence.  For  example,  the  first  point  is  assigned 
t  —  0,  the  last  point  has  t  —  1,  and  the  t-values  for  all  points  in  between  are 
distributed  evenly.  Then,  this  iterative  technique  is  avoided,  and  the  closest  curve 
point  can  be  computed  directly  from  the  t-value  associated  with  the  measured  edge 
point. 

This  approximation  is  accurate  under  the  assumption  that  the  edge  points  are 
evenly  distributed  (i.e.,  there  are  no  large  gaps),  which  is  already  a  requirement 
for  good  curve  fitting  results. 


4.3.2  Initial  Guess  of  Parameters 

The  Levenberg-Marquardt  approach,  like  most  other  iterative  techniques,  is  sen¬ 
sitive  to  the  initial  parameter  estimates,  so  an  intelligent  starting  point  must  be 
selected.  Here,  our  initial  guess  is  obtained  by  looking  at  the  start  and  end  points 
of  the  measured  curves.  Since  we  already  ensure  that  the  start  and  end  points 
are  approximately  equivalent  in  both  images,  we  have  an  approximate  stereo  cor¬ 
respondence.  Thus,  we  can  triangulate  their  positions  in  the  world  frame  using 
(3.8),  obtaining  2 M  points  on  the  ground  plane  (recall  that  M  is  the  number  of 
observed  curves).  The  plane  of  best  fit  for  these  points  provides  an  initial  estimate 
of  the  roll,  pitch  and  height,  and  by  transforming  the  points  using  the  determined 
angles,  we  also  obtain  estimates  for  the  start  and  end  control  points  of  the  curve  in 
the  world  frame.  Linearly  interpolating  between  these  points  gives  us  estimates  of 
the  remaining  curve  control  points  that  are  practical  for  most  operating  scenarios. 


4.3.3  Outlier  Rejection 

While  experiments  with  the  curve  fitting  approach  demonstrate  it  to  be  feasible  in 
determining  the  camera’s  orientation  with  respect  to  the  ground  plane,  it  is  pos¬ 
sible  on  some  frames  for  the  nonlinear  optimization  to  diverge  or  converge  to  an 
incorrect  equilibrium,  since  the  problem  is  not  convex.  To  ensure  that  outlying 
curve  fitting  results  do  not  adversely  affect  the  SLAM  results,  a  simple  outlier  re¬ 
jection  technique  has  been  used.  At  each  stereo  frame,  if  the  measured  parameters 
(output  of  the  curve  fit)  lie  more  than  3  standard  deviations  away  (in  terms  of  the 
measurement  covariance)  from  the  current  estimate  of  roll,  pitch  and  height,  the 
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frame  is  discarded  and  the  SLAM  algorithm  moves  to  the  next  frame.  Similarly, 
if  the  curve  fit  has  a  fitting  error  over  a  predefined  threshold,  the  frame  is  also 
discarded. 

4.3.4  Summary 

By  applying  this  technique  on  raw  edge  point  measurements  (grouped  into  dif¬ 
ferent  curves),  the  algorithm  obtains  the  optimal  estimates  for  the  curve  control 
points  and  the  out-of-plane  pose  parameters,  minimizing  the  reprojection  error 
between  the  curve  model  and  the  measured  edge  pixels. 

The  final  result  is  that  each  stereo  frame  gives  us  an  absolute  measurement  of 
three  of  the  degrees  of  freedom  and  an  expression  for  the  curves  in  the  ground 
plane. 
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CHAPTER  5 


SLAM  FORMULATION 


This  chapter  details  the  SLAM  formulation  for  the  proposed  algorithm.  In  the 
following  sections,  the  data  association  technique  is  first  discussed,  a  necessary 
curve  splitting  algorithm  is  explained,  and  then  the  SLAM  equations  are  derived. 


5 . 1  Curve  Data  Association 

Before  deriving  the  SLAM  equations,  we  first  need  to  consider  data  association 
between  measured  curves  (control  points  obtained  in  the  curve  fitting  procedure) 
and  map  curves  (landmarks  existing  in  the  map). 

Specifically,  we  need  to  consider  that  measured  curves  do  not  have  a  one-to-one 
correspondence  with  the  map  curves.  Indeed,  the  measured  curve  may  correspond 
to  part  of  a  single  map  curve,  or  parts  of  multiple  connected  map  curves,  etc.  This 
is  illustrated  in  the  example  of  Figure  5.1(a),  where  the  measured  curve  z  matches 
two  map  curves  x,  and  xr  Thus,  we  need  to  determine  a)  the  map  curve(s)  to 
which  the  measured  curve  corresponds,  and  b)  the  start  and  end  points  of  the 
map  curve  segments  to  which  the  measured  curve  corresponds.  In  the  example 
of  Figure  5.1(a),  this  means  we  need  to  determine  the  parameter  values  U,  tj  and 
tz.  Then,  the  algorithm  can  determine  whether  to  add  curves,  update  curves,  or 
perform  both. 

To  determine  these  parameter  values,  we  track  the  endpoints  of  the  existing  map 
curves  (all  in  the  left  image  frame)  using  the  Lucas-Kanade  tracking  algorithm 
[34].  It  is  important  to  stress  that  the  algorithm  only  needs  to  track  the  endpoints 
of  the  map  curves  (as  shown  in  Figure  5.2);  all  structural  information  of  the  curves 
is  maintained  through  the  curve  parameters,  thereby  avoiding  the  drawbacks  of 
point-based  methods  outlined  earlier. 

The  procedure  for  each  curve  is  as  follows: 

1.  The  endpoints  of  the  map  curve  are  continuously  tracked  in  all  image  frames 
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Figure  5.1:  Updating  map  curves  (left),  and  adding  new  curves  to  the  map  (right) 
based  on  a  measured  curve  at  one  timestep.  The  measured  curve  is  shown  in  red, 
and  the  map  curves  are  in  blue  and  black. 

(from  the  point  it  is  first  initialized). 

2.  The  measured  curve  (i.e.,  the  output  of  the  curve  fit)  is  projected  back  to  the 
image. 

3.  Then,  tz  is  the  t-value  of  the  point  on  the  projected  curve  closest  to  the 
tracked  endpoint,  and  can  be  found  using  the  procedure  outlined  in  chapter 
4.3.1. 

4.  We  then  specify  t,  =  t j  =  1  —  tz. 

While  this  last  step  may  seem  imprecise,  it  is  effective  in  practice,  and,  in  fact, 
naturally  regulates  the  length  of  each  introduced  map  curve.  If  we  ensure  that  the 
number  of  edge  pixels  in  each  measured  curve  is  reasonably  constant  (i.e.,  same 
“size”  in  successive  images)  then  as  the  map  endpoint  moves  along  the  measured 
curve  (Figure  5.2),  we  observe  more  of  the  “new”  map  curve  and  less  of  the  pre¬ 
vious  map  curve.  In  Figure  5.1,  this  is  equivalent  to  the  measured  curve  z  cor¬ 
responding  entirely  with  x,;  initially,  and  then  moving  gradually  to  Xj.  Thus,  we 
ensure  that  the  t-value  changes  uniformly  as  motion  occurs,  and  that  new  map 
curves  have  approximately  similar  lengths. 

To  mitigate  the  effect  of  tracking  errors,  the  algorithm  compares  the  motion  of 
each  tracked  point  with  the  motion  of  two  other  proximal  edge  points  (both  lying 
on  the  measured  curve).  If  the  difference  exceeds  a  threshold,  the  data  association 
is  considered  a  failure,  and  the  measured  curve  is  independently  added  to  the  map. 
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Figure  5.2:  Early  frame  (left)  and  current  frame  (right),  with  measured  curve 
endpoints  for  each  frame  in  red  and  map  curve  endpoints  in  blue 


5.1.1  Curve  Splitting  Algorithm 

Next,  before  we  can  derive  the  observation  model  and  the  SLAM  algorithm,  we 
need  to  develop  a  technique  for  updating  an  entire  map  curve  based  on  a  partial 
measurement. 

This  can  be  achieved  by  taking  advantage  of  the  De  Casteljau  algorithm  for 
recursively  evaluating  Bezier  curves  [33].  For  any  n- degree  Bezier  curve,  this 
approach  provides  a  linear  relation  that  can  be  used  to  split  the  curve  at  any  spe¬ 
cific  parameter  value  ts  into  two  n-degree  Bezier  curves.  For  a  cubic  curve,  the 
algorithm  is  defined  as  follows: 

1.  Start  with  the  4  Bezier  control  points  {af,"*.  a^,  a{,2\  a^},  and  choose  the 
split  parameter  value  ts. 

2.  Divide  each  segment  of  the  control  polygon  in  the  ratio  of  ts  to  1  —  ts  to 
construct  the  3  “intermediate”  control  points  {af\  a':, 1 ' .  a,|2)  }. 

3.  Recurse  until  the  result  is  a  single  point;  a^3°\ 

Figure  5.3  shows  this  process,  and  the  two  dimensional  intermediate  control 
points  are  denoted  as  a  -  ,  where  j  is  the  level  of  recursion  and  i  is  the  number  of 
the  control  point  for  the  intermediate  curve. 

Thus,  by  this  recursive  definition,  each  control  point  of  the  n-degree  intermedi¬ 
ate  curve  is  defined  as  a  convex  combination  of  two  control  points  of  the  (n  +  1)- 
degree  intermediate  curve  (Figure  5.3),  with  the  recurrence  relation: 

a$l)  =  (1  -  ts) a^lj  +  fsa^1}  (5.1) 
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Figure  5.3:  Recursive  definition  of  Bezier  curves 


Noticing  that  the  intermediate  points  also  act  as  control  points  for  specific  seg¬ 
ments  of  the  curve,  we  can  use  this  formulation  to  split  the  original  Bezier  curve 
into  two  curves  of  the  same  order,  at  the  point  a^.  Referring  to  Figure  5.3,  the 
control  points  of  the  original  curve  are  (a^,  a^,  aj,2i,  a(j3) }  and  the  control  points 
of  the  split  curves  are  given  by  {a^,  a^°\  a^,  al^ }  and  {a^,  a  a(j3i}. 

Using  the  notation  introduced  in  chapter  3.2,  we  can  define  the  control  point 
vector  of  the  original  curve  as  p<?  =  p^\  px  \  p^\  p i3),  Py) ,  Py\  p (y\  p y3)  , 

and  the  control  point  vectors  of  the  split  curves  in  the  same  form  as  p ti  and  pf2  to 
refer  to  the  first  and  second  segments  respectively.  Again  following  the  previously 
defined  notation,  we  can  refer  to  each  control  point  as  p^W  for  the  original  curve, 
and  p/  j*'1  and  p$  for  the  split  curves. 

Then,  using  (5.1),  we  get  the  following  relationships: 


p?  =  J2  jp? 

j= 0  ^ 

PS  =  (5.2) 

j=i  '  ' 

We  can  write  this  in  matrix  form  using  lower  triangular  matrix  Si  and  upper 
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triangular  matrix  S2: 
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0 
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0 

52 


P  e 


P  e 


(5.3) 


Thus,  importantly,  any  segment  of  a  Bezier  curve  can  be  obtained  from  the 
whole  curve  by  applying  a  linear  transformation  to  the  original  control  points. 
Since  the  transformation  matrices  are  triangular  with  non-zero  diagonal  elements, 
they  are  invertible.  Further,  they  are  independent  of  the  control  points  and  are  only 
a  function  of  the  parameter  value  at  the  split  point,  ts. 

Given  this  method  for  splitting  curves,  we  can  easily  equate  a  measured  curve 
to  any  combination  of  existing  map  curves  in  order  to  a)  update  the  existing  map, 
and  b)  add  new  curves  to  the  map.  Based  on  the  data  association  parameters 
determined  using  the  procedure  in  chapter  5. 1,  it  is  easy  to  determine  which  matrix 
to  use. 


5.2  State  Model 

The  state  model  is  similar  to  a  point-based  EKF-SLAM  formulation,  with  the 
state  vector  x  containing  the  robot  pose  xr  and  N  landmark  states  x,  with  i  e 
(1,  2,  ...,  N}. 

x  —  [xrT,  X!r,  ...,  XjvT]T  (5.4) 

The  robot  pose  state  consists  of  the  vehicle  position  vector  rr  =  \xr,  yr,  zr]T , 
orientation  in  terms  of  Euler  angles  \Fr  =  [cj)r ,  0r ,  ky]r,  the  linear  velocity  vr  and 
angular  velocity  u)r,  all  defined  with  respect  to  the  Earth  frame.  By  using  this 
parametrization,  the  process  model  is  purely  linear. 
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X 


(5.5) 


T 


The  landmark  states  x1;  x2, are  the  control  point  vectors  of  each  of  the 
N  planar  curves  in  the  Earth  frame,  with  the  jth  landmark  given  by  control  point 
vector  xj  =  [x(°),  x^2\  x^3\  y(°\  y^\  y^2\  yW]  T ,  where  (j;W,  r/*))  are 

the  coordinates  of  the  ith  control  point. 

As  in  [31],  it  is  assumed  that  the  state  is  normally  distributed,  with  an  estimate 
at  timestep  k  of  the  mean 


it(k\k) 


xr(/c|/c)T,  X!(A;|A;)t,  itN(k\k)T 


and  covariance  matrix 


(5.6) 


P(k\k) 


Prr(k\k)  PV)i(k\k) 
P  i,r{k\k)  P  i,i(k\k) 


Pr,N(k\k) 

P  i,N(k\k) 


PN,r(k\k)  PNjl(k\k)  ■■■  PN,N(k\k) 


(5.7) 


Here,  the  submatrices  Pr,r(k \ k),  Pr,i(k \ k),  and  Pij(k \ k)  (withi  e  (1,  2,  N}) 

are  the  cross-covariances  between  the  robot  pose  (subscript  r)  and  the  N  curves 
(subscripts  1,  2, N). 


5.3  Vehicle  Process  Model 

For  the  prediction  step,  a  kinematic  motion  model  is  used.  We  model  the  unknown 
linear  and  angular  accelerations  as  zero-mean  Gaussian  vectors  a  ~  N  (0,  Ea) 
and  a  ~  J\f  (0,  Sa)  respectively,  both  in  the  Earth  frame,  such  that  at  timestep  k, 
the  linear  and  angular  velocities  experience  a  change  of  approximately  V (k)  = 
a  A  t  and  fl(k')  =  a.  At.  We  get  the  following  vehicle  process  model: 
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xr(/c  +  1) 


rr{k  +  1) 

'S>r{k+  1) 

vr(/c  +  1) 
ujT  ( k  -)- 1 ) 


=  xr(/c)  + 


(vr(/c)  +  V(/c))  At 

(a }r(k)  +  f l(k))  At 

V(k) 
n  (k) 


(5.8) 


The  tradeoff  for  linearity  in  the  process  model  is  the  fact  that  the  process  noise 
covariance  matrices  £a  and  are  not  diagonal  (indeed,  they  are  diagonal  in 
the  body  frame).  In  other  words,  the  angular  and  linear  accelerations  are  only 
independent  in  the  body  frame,  and  the  associated  diagonal  covariance  matrix 
will  require  a  basis  change  from  the  Body  frame  to  the  Earth  frame  to  obtain 
and  £q. 


5.4  Observation  Model 

The  state  vector,  comprising  the  pose  state  and  all  of  the  landmark  states,  is: 

X  =  [xjf  ^  VrT  Ur1  ...  XatT]T  (5.9) 

The  measurement  vector,  containing  the  out-of-plane  pose  measurement  {< f>r ,  6r,  u>r } 
and  the  M  curve  measurements  in  the  Local  ground  frame,  is: 

z  =  [<j)r  0r  zr  ziT  ...  zmt]T  (5.10) 

These  are  measured  from  the  curve  fitting  process,  outlined  in  Chapter  4. 

Since  the  out-of-plane  pose  is  measured  directly,  the  update  step  for  these  vari¬ 
ables  is  straightforward.  Nonetheless,  the  relationship  between  the  existing  curve 
states  and  the  measurements  needs  to  be  determined. 
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5.4.1  Updating  Existing  Curve  States 

Consider  the  case  in  Figure  5.1(a)  where  a  measured  curve  z  corresponds  as  shown 
to  two  existing  map  curves  x*  and  xy. 

With  the  ti,  tj  and  tz  values  representing  the  correspondences  (obtained  from 
the  data  association  step),  we  can  directly  derive  the  observation  model  by  split¬ 
ting  the  curves  with  de  Casteljau’s  algorithm  (Section  5.1.1).  First,  we  split  the 
original  measurement  into  two  (zx  and  z2),  and  then  equate  these  two  with  the 
appropriate  segments  of  the  map  curves: 


s  2(ti) 

0 

0 

s  2{ti)_ 

Si  (tj) 

0 

0 

Si  (tj) 

(5.11) 


Here,  S 2(C)  and  Si  (tj)  are  the  linear  transformations  required  to  split  the  curves 
at  parameter  values  ti  and  tj  (defined  in  Section  5.1.1),  and  need  to  be  applied 
separately  to  both  the  x  and  the  y  co-ordinates  of  the  map  curves. 

In  the  general  case,  we  have  a  series  of  split  measurements  each  corresponding 
to  a  segment  of  a  different  curve.  To  handle  this  case,  let  each  split  measurement 
z i  correspond  to  existing  map  curve  i  with  associated  split  matrix  S,  determined 
from  the  t-value  at  the  data  association  step.  Then,  we  can  generalize  this  to: 


z  i  = 


S  i  0 

0  Si 


(5.12) 


If  we  consider  that  the  measurement  is  made  in  the  vehicle  body  frame,  and 
the  vehicle  has  planar  pose  of  {x,  y ,  C  },  then  the  observation  equation  in  the  form 
z  =  h  (x)  becomes  the  following: 


Z  i  =  R  eg{A 


\-l 


s*  0 

0  Si 


X-i  y 


(5.13) 


where  Re9(r/y)  and  teg(xr,  yr)  are  the  rotation  matrix  and  translation  vector  trans¬ 
forming  the  curve  parameters  between  the  Local  Ground  frame  and  the  Earth 
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frame. 

Since  we  have  4  different  x  and  y  co-ordinates  (corresponding  to  the  control 
points),  the  expressions  for  the  rotation  and  translation  matrices  vary  slightly  from 
the  well-known  matrix  forms.  If  we  define  the  column  vector  U  =  [1,  1,  1,  l]7 
and  utilize  the  4x4  identity  matrix  I4,  we  have: 


I4  COS  A 

— 14  sin  Vy 

t 

Ua:r 

I4  sin  'A 

I4  COS  7pr 

■ 

_U  yr_ 

The  standard  EKF  update  equations  are  then  applied. 


(5.14) 


5.4.2  Adding  New  States 

In  the  case  where  map  curve  x,-  does  not  yet  exist  (Figure  5.1(b)),  we  can  insert  the 
curve  into  the  state  vector  and  augment  the  covariance  matrix  with  the  necessary 
cross-covariances  Pjv+i,jv+i>  Pr,jv+i>  and  P^jv+i: 


XjV+l 


P  JV+1,JV+1 
P  r,iV+l 

P  i,N+l 


=  g  (x,  z) 

"S"1  0 

0  S"1 


P'C(/Z2  T  teg. 


=  GCPG^  +  GZRGJ 


=  P 
=  P 


T 

N+l,r 

T 

N+l,i 


PrrG 


T 

x 


P,  ;G 


T 

X 


(5.15) 


where  R  is  the  measurement  covariance  matrix,  S  is  the  split  matrix  associated 
with  the  newly  added  curve,  and  Gx  =  ||  and  Gz  =  ||  are  the  Jacobians  of 
g  (x.  z)  with  respect  to  the  state  and  measurement  respectively. 


5.5  Extended  Kalman  Filtering 

Given  the  process  and  observation  models,  the  standard  EKF  equations  follow, 
outlined  in  detail  in  the  following  sections. 
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5.5.1  Prediction  stage 


In  the  prediction  stage  at  timestep  k  +  1,  the  estimated  state  and  covariance  matrix 
are  updated  using  the  Jacobian  F  of  the  process  model  f(x): 


x(/c  +  1  \k) 

=  f  (x(/c|/c)) 

(5.16) 

Pr,r(k  +  1  \k) 

=  FP  ,,r(k\k)FT  +  Q{k) 

(5.17) 

P  r,i(k  +  1|  k) 

=  FP  ,,i(k\k) 

(5.18) 

P  i,r(k  +  1  \k) 

=  P  i>r(k\k)FT 

(5.19) 

p  i,i{k  +  1  \k) 

=  P  i,i(k\k) 

(5.20) 

The  Jacobian  F  is  found  by  linearizing  the  process  model  about  the  current 
operating  point  (i.e.,  the  current  pose  estimate).  Since  only  the  robot  pose  terms 
are  dynamic  (while  all  landmarks  are  stationary),  F  is  defined  here  as  the  Jacobian 
of  robot  pose  states.  Specifically: 


F 


dxr(k  +  1) 
dxr(k ) 


drr(k+l) 

drr(k-\-l) 

drr(k+l) 

drr(k-\-l) 

drr(k) 

d^r{k) 

dvr(k) 

dcjr{k) 

d&r(k+l) 

d^/r(k-\-l) 

d^r(k-\- 1) 

d^r(k-\-l) 

drr(k) 

d^r(k) 

dvr(k) 

dcjr(k) 

(5.22) 

dvr(k-\- 1) 

dvr(k-\-l) 

dvr(k-\- 1) 

dvr(k-\- 1) 

dvr(k) 

dVr(k) 

dvr(k) 

dcjr(k) 

dujr(k+l) 

dujr{k-\-l) 

d<jjr(k+ 1) 

dujr(k-\- 1) 

drr(k) 

d^r{k) 

dvr(k) 

dcjr(k)  _ 

I  0  IAt  0 

0  1  0  IAt 

0  0  1  0 

0  0  0  I 


(5.23) 


(5.24) 


The  matrix  Q (k)  is  the  covariance  of  the  process  noise,  which,  from  the  process 
model,  has  the  following  form: 


Q  (k)  =  FQFTAt  = 


A  f3Sa 

0 

A  f2Sa 

0 

0 

A  f3Sa 

0 

A  f2Sa 

A  f2Sa 

0 

A iSa 

0 

0 

A  f2Sa 

0 

A  ,t£a 
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(5.25) 


Recall  that  all  pose  variables  are  defined  in  the  Earth  frame,  while  we  assume 
that  the  error  covariances  are  independent  only  in  the  body  frame.  This  means 
that  the  covariance  matrices  and  S,,:  are  not  diagonal.  Indeed,  we  define  two 
corresponding  diagonal  covariance  matrices  in  the  Body  frame,  £a'  and  £a',  and 
apply  a  change  of  basis  between  the  Body  and  Earth  frames  using  the  rotation 
matrix  Rr/;: 


Sa  —  Re65ja/R^) 


(5.26) 


5.5.2  Update  stage 

In  the  update  stage  at  timestep  k  +  1,  we  have  the  measurement  vector  z(k')  = 
\ziT,  ...,  zmt]T  containing  the  (partial)  measurement  of  M  map  curves,  and  the 
estimated  state  and  covariance  matrix  are  updated  using  the  Jacobian  H  of  the 
process  model  h(x): 


x(/c  +  1  \k  +  1) 

=  x(/c  +  1 1 A;)  +  K (k)  [z (k)  —  h(x(/c  +  l\k))] 

(5.27) 

m 

=  HP(k  +  l\k)UT  +  R(k) 

(5.28) 

K  (k) 

=  P(k  +  1  /c)HtS(A;)-1 

(5.29) 

P(k  +  1  \k  +  1) 

=  P{k  +  l\k)-K(k)S(k)K(k) 

(5.30) 

Here,  R {k)  is  the  covariance  matrix  of  the  measurement  noise,  which  we  also 
assume  is  zero  except  for  along  the  diagonals. 

We  can  derive  the  observation  Jacobian  as  follows: 


dz± 

dz  i 

dz\ 

dxr 

dx  i 

dx  jv 

dz2 

dz2 

dz? 

dx.r 

dx  i 

dxjv 

dzM 

dzM 

dzM 

-  dx.r 

dx  i 

dx^ 

(5.31) 


The  rotation  and  transformation  matrices  as  used  in  the  SLAM  observation 
equations  are  given  by: 
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(5.32) 


X5xr 
U  yr 

Since  each  (split)  measurement  corresponds  one  to  one  with  a  segment  from  one 
single  curve,  most  of  the  partial  derivatives  with  respect  to  state  curves  are  zero. 
Suppose  that  measurement  z,:  corresponds  to  state  curve  x/,  with  curve  splitting 
matrix  S.  Then,  for  each  measured  curve  z.,  we  have: 
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s  0 
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0,  j  ±  k 


(5.33) 

(5.34) 

(5.35) 

(5.36) 

(5.37) 

(5.38) 


All  other  terms  in  the  Jacobian  are  zero.  It  is  important  to  note  that  since  the 
curve  measurements  are  in  the  Local  Ground  frame,  they  are  independent  of  the 
out-of-plane  pose  variables. 
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CHAPTER  6 


RESULTS 


This  chapter  presents  the  results  obtained  using  the  proposed  algorithm.  Vision- 
based  results  are  presented  in  order  to  demonstrate  the  effectiveness  of  the  algo¬ 
rithm  in  real  environments,  and  Monte  Carlo  simulation  results  are  used  to  analyze 
the  accuracy  and  consistency  of  the  SLAM  algorithm. 


6.1  Vision  Results 

To  demonstrate  effectiveness  with  real  data,  three  paths,  with  lengths  ranging  up 
to  100m,  were  mapped  using  the  algorithm.  Vision  data  was  obtained  using  a 
stereo  camera  rig  with  a  fixed  baseline  of  55cm.  The  estimated  maps  are  shown 
in  Figure  6.1  overlaid  on  Google  satellite  imagery.  Over  these  distances,  the  drift 
of  the  maps  and  trajectories  are  on  the  order  of  2-5m.  The  second  and  third  paths 
are  particularly  challenging  due  to  the  lack  of  clear  path  edges;  nonetheless,  the 
algorithm  is  able  to  yield  a  reasonable  map.  In  the  second  example,  the  algorithm 
also  deals  with  a  number  of  successive  frames  without  a  good  curve  measurement, 
where  the  pose  is  updated  based  on  the  prediction  alone.  Here,  the  trajectory  is 
discontinuous  when  it  finally  receives  an  update,  but  the  map  still  remains  contin¬ 
uous  (Figure  6.1(b)). 

Such  environments  could  not  be  mapped  with  laser  ranging  sensors,  since  there 
would  be  few  laser  returns  from  path  edges.  With  the  path  in  Figure  6.1(b)  par¬ 
tially  obstructed  by  overhanging  trees,  even  satellite  imagery  cannot  produce  ad¬ 
equate  detail.  More  significantly,  the  demonstrated  level  of  accuracy  is  achieved 
with  few  states,  only  using  curve  structures  that  are  integral  to  the  mapping  re¬ 
quirements.  Ultimately,  navigation  in  path  environments  is  possible  without  uti¬ 
lizing  point  features,  with  the  edges  of  the  path  alone. 

The  results  demonstrate  that  a)  the  curve  fitting  algorithm  can  extract  real  world 
curves  from  edge  points  in  a  stereo  image  pair,  and  b)  the  EKF-based  CurveSLAM 
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(a)  Mapping  of  a  path  near  Talbot  Laboratory, 
UIUC,  Urbana,  IL  (30m) 


(b)  More  challenging  path  in  Crystal  Lake  Park,  Urbana,  IL  (50m).  The  algorithm  recovers  from 
a  series  of  frames  without  good  path  curve  measurements  (discontinuity  shown) 


(c)  Longer  path  in  Crystal  Lake  Park,  Urbana,  IL  (100m). 

Figure  6.1:  Vision  results  on  three  paths,  with  varying  length  and  difficulty 
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formulation  can  process  these  measurements  into  a  cumulative  pose  and  map  es¬ 
timate.  The  algorithm  is  fully  autonomous,  and  currently  operates  at  5-10  Hz  on 
a  laptop  with  2.3  GHz  Pentium  Dual-core  processor;  this  could  be  improved  with 
adequate  code  optimization. 


6.2  Simulation  Results 

To  examine  the  effectiveness  of  the  CurveSLAM  algorithm,  simulations  were  per¬ 
formed  using  three  sample  environments.  Two  main  sources  of  error  were  incor¬ 
porated  as  additive  Gaussian  noise:  error  in  the  detected  edge  pixels  with  standard 
deviation  ap  —  2  pixels,  and  error  in  the  estimated  data  association  parameter  (t- 
value)  of  standard  deviation  crda  =  0.1.  These  are  both  exaggerated  estimates  of 
the  error  encountered  in  real  environments:  the  simulated  noisy  edge  points  are 
scattered  more  than  edge  points  in  real  images  (Figure  6.2),  while  an  error  of  0.1 
in  the  matching  parameter  means  the  curve  matching  is  incorrect  by  a  tenth  of  the 
curve  length. 


Figure  6.2:  Typical  edge  pixels  in  real  environment  (left),  and  typical  edge  pixels 
in  simulation  (right) 

In  each  environment,  the  vehicle  travelled  two  loops  (for  a  total  distance  of  80m, 
120m,  and  200m  respectively),  and  loop  closure  was  formulated  as  the  solution  to 
the  constrained  optimization  problem: 


min  f(xc)  =  (xc-xu)2P  1  (xc  -  xu) 

Xc 

subject  to  h(xc)  =  0  (6.1) 

Here,  the  unconstrained  state  is  given  by  xu,  while  the  constrained  state  (following 
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loop  closure)  is  given  by  xc.  The  constraint  equations  h(x)  =  0  specify  the 
completion  of  the  loop  (i.e.,  xr  =  0 ,yr  —  0,  and  =  0).  The  solution  approach 
for  this  problem  is  found  in  [37]. 

The  constructed  map,  as  well  as  the  vehicle  localization  errors,  are  shown  in 
Figures  6.3,  6.4,  and  6.5. 

The  mapping  results  demonstrate  the  effectiveness  of  CurveSLAM.  While  some 
drift  is  to  be  expected,  the  resulting  estimation  of  the  maps  are  reasonably  accu¬ 
rate.  The  largest  localization  errors  arise  in  the  planar  variables  (x,  y,  and  w), 
which  can  be  expected,  since  the  remaining  pose  variables  are  measured  directly 
from  the  curve  fitting  process.  The  estimated  maps  are  nearly  as  accurate  as  those 
obtained  in  simulation  by  Pedraza  et  al.  [31].  Notably,  however,  our  simulations 
consider  data  association  error  as  well,  and  do  not  utilize  vehicle  odometry,  with 
the  linear  and  angular  velocities  also  estimated  within  the  EKF.  It  must  also  be 
noted  that  while  laser  range  finders  are  renown  for  high  levels  of  accuracy,  the 
accuracy  of  vision  data  tends  to  be  lower:  this  is  a  compromise  for  the  portability 
and  richness  of  information  that  a  camera  can  provide,  and  one  of  the  challenges 
of  Visual  SLAM.  Nonetheless,  the  simulation  results  are  comparable  to  [31]. 


6.2.1  Consistency  Analysis 

EKF-SLAM  consistency  has  been  studied  extensively  in  the  literature  [38] [39] [40], 
with  various  suggestions  to  improve  filter  consistency. 

When  the  true  vehicle  state  is  known  (as  is  the  case  in  simulation),  the  well 
known  Normalised  Estimation  Error  Squared  (NEES)  can  be  used  to  character¬ 
ize  filter  performance  and  consistency.  It  is  the  error  squared  normalized  by  the 
covariance,  given  by: 

6k  =  (x(/c|A;)  —  x(/c|A;))?  P(k\k)^1  (x(A;|/c)  —  x(/c|/c))  (6.2) 

Under  the  hypothesis  that  the  filter  is  consistent  and  approximately  Linear- 
Gaussian  (an  assumption  for  all  EKF-based  SLAM  algorithms),  follows  the 
distribution  of  the  sum-square  of  dim  (x/j  standard  random  variables,  which  is  a 
X2  distribution,  with  the  same  number  of  degrees  of  freedom  as  the  vehicle,  which 
is,  in  our  case,  6DOF.  Then,  by  running  N  Monte  Carlo  runs,  and  as  N  approaches 
infinity,  the  expected  value  of  the  NEES  is  E  [e*,]  =  dim  (x(A'))  =  6  [38].  With 
50  Monte  Carlo  runs,  we  have  a  95%  confidence  interval  of  [5.08,  7.00].  That  is, 
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we  can  be  95%  certain  that  the  filter  behaves  as  a  consistent  Linear-Gaussian  es¬ 
timator  if  the  average  value  of  the  NEES  over  50  runs  remains  within  this  range 
for  the  whole  simulation  duration.  If  the  NEES  is  below  this  interval,  the  estimate 
of  the  covariance  is  conservative,  while  if  the  NEES  is  above  this  interval,  the 
covariance  estimate  is  optimistic  (i.e.,  underestimated). 

Thus,  we  set  up  our  Monte  Carlo  runs  as  follows:  the  robot  travels  through 
the  three  simulated  environments,  and  the  NEES  is  recorded  over  50  Monte  Carlo 
runs.  For  each  run,  the  algorithm  is  initialized  with  a  different  seed  for  the  Ran¬ 
dom  Number  Generator,  thereby  sampling  the  entire  error  space  as  the  number  of 
runs  approaches  infinity.  The  plots  are  shown  in  Figure  6.6. 

The  NEES  plots  in  Figure  6.6  illustrate  the  consistency  properties  of  the  CurveS- 
LAM  algorithm.  A  higher  NEES  value  is  undesirable,  since  it  indicates  an  opti¬ 
mistic  covariance  estimate  and  filter  inconsistency.  There  are  no  obvious  symp¬ 
toms  of  this  in  the  map,  such  as  divergence  or  “jumps”  in  the  vehicle  trajectory 
estimate,  but  with  a  gradually  inconsistent  filter,  this  is  a  possibility  over  much 
larger  distances,  a  potential  limitation  of  any  EKF-based  approach. 

As  indicated  by  the  initial  low  NEES  values,  the  covariance  estimates  begin 
conservatively,  and  then  remain  in  the  vicinity  of  the  95%  confidence  interval.  A 
spike  occurs  as  the  vehicle  commences  its  second  loop,  but  the  NEES  quickly 
reduces  back  to  reasonable  levels.  Indeed,  the  only  extended  period  for  which  the 
covariance  estimate  is  optimistic  (i.e.,  the  NEES  is  high)  is  during  the  vehicle’s 
second  loop,  when  it  revisits  areas  it  has  previously  observed.  These  consistency 
results  are  an  improvement  over  those  of  Pedraza  et  al.  [31],  in  terms  of  the  peak 
NEES  value  and  the  duration  of  filter  inconsistency.  Particularly,  it  takes  longer 
before  the  NEES  remains  beyond  the  95%  confidence  interval,  and  the  peak  NEES 
is  at  least  an  order  of  magnitude  lower. 
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Estimated  Map 


Position  errors  (m) 


Figure  6.3:  Simulation  results  for  map  1:  estimated  and  true  map  (top),  position 
error  (center),  and  orientation  error  (bottom). 
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Figure  6.4:  Simulation  results  for  map  2:  estimated  and  true  map  (top),  position 
error  (center),  and  orientation  error  (bottom). 
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Figure  6.5:  Simulation  results  for  map  3:  estimated  and  true  map  (top),  position 
error  (center),  and  orientation  error  (bottom). 
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Figure  6.6:  Monte  Carlo  consistency  results  for  maps  1  to  3  (top  to  bottom), 
showing  the  average  NEES  over  50  runs 
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CHAPTER  7 


CONCLUSIONS 


This  thesis  has  presented  a  technique  to  incorporate  higher  level  curve  structures 
into  visual  SLAM.  Simulation  results  suggest  that  the  CurveSLAM  formulation 
can  produce  accurate  mapping  results,  while  experimental  results  demonstrate  the 
effectiveness  of  the  curve  fitting  and  CurveSLAM  algorithms  with  real  data.  In 
both  cases,  the  proposed  algorithm  can  produce  structured,  uncluttered  maps  and 
provide  good  navigation  results  with  a  much  smaller  state  space  than  most  point- 
based  visual  SLAM  techniques.  Further,  the  algorithm  can  be  useful  in  areas  in 
which  laser  range  finding  techniques  will  fail  (for  example,  mapping  the  edges 
of  a  path).  Monte  Carlo  simulation  results  show  that  the  proposed  technique  can 
maintain  consistency,  offering  an  improvement  over  previous  work. 

Nonetheless,  further  work  is  needed  to  ensure  that  the  method  is  effective  in 
producing  accurate  and  consistent  estimates  over  large  distances. 


7.1  Future  Work 

Current  work  is  focused  on  obtaining  consistent  mapping  results  over  larger  dis¬ 
tances  and  in  a  range  of  environments.  Future  tasks  include  (but  are  not  limited 
to)  the  following: 

1.  Replacing  the  iterative  curve  fitting  technique  with  an  analytical  approach 

Previous  work  in  curve-based  reconstruction  offers  analytical  solutions  for 
algebraic  curves,  but  these  can  be  difficult  to  utilize  computationally  in  the 
proposed  SLAM  framework.  To  our  best  knowledge,  there  is  no  existing 
analytical  technique  for  stereo  projective  reconstruction  of  Bezier  curves, 
and  such  a  method  would  be  ideal  for  this  algorithm. 

2.  Extending  the  algorithm  to  admit  non-planar  curves 
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Our  experiments  demonstrated  that  the  assumption  of  ground  planarity  is 
reasonable,  and  that  the  proposed  algorithm  is  even  able  to  provide  map¬ 
ping  and  localization  accuracy  in  environments  without  a  precisely  planar 
ground.  Nonetheless,  there  is  some  benefit  to  extending  this  approach  to 
non-planar  curves.  Firstly,  by  allowing  the  use  of  curves  outside  the  ground 
region,  other  significant  structures  can  be  mapped  (eg.  trees).  Secondly,  this 
allows  for  additional  observability  in  the  motion  of  the  vehicle,  and  allow 
for  reasonable  estimates  even  when  the  ground  plane  is  not  in  view. 

3.  Improvement  of  algorithm  consistency 

This  may  be  achieved  using  submap  techniques  or  utilizing  techniques  high¬ 
lighted  in  the  SLAM  consistency  literature,  such  as  by  using  the  First  Ever 
Jacobians  (FEJ)  method  [39],  or  by  utilizing  bearing  only  measurements 
[40].  Alternatively,  another  option  would  be  to  modify  the  SLAM  algorithm 
to  better  account  for  the  nonlinear  observation  and  process  model  (such  as 
using  an  estimator  based  on  the  FastSLAM  algorithm  [41]  ). 

4.  Application  to  Path  Planning  and  Control 

The  mapping  and  navigation  results  shown  would  be  quite  useful  when  de¬ 
veloping  motion  planning  and  control  algorithms.  With  the  boundaries  of 
the  path  continually  maintained  by  the  SLAM  algorithm,  a  novel  planning 
/  control  algorithm  could  be  used  to  take  into  account  this  additional  useful 
information. 
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